10 research outputs found

    Dimensionality Reduction using PCA and K-Means Clustering for Breast Cancer Prediction

    Get PDF
    Breast cancer is the most important cause of death among women. A prediction of breast cancer in early stage provides a greater possibility of its cure. It needs a breast cancer prediction tool that can classify a breast tumor whether it was a harmful malignant tumor or un-harmful benign tumor. In this paper, two algorithms of machine learning, namely Support Vector Machine and Extreme Gradient Boosting technique will be compared for classification purpose. Prior to the classification, the number of data attribute will be reduced from the raw data by extracting features using Principal Component Analysis. A clustering method, namely K-Means is also used for dimensionality reduction besides the Principal Component Analysis. This paper will present a comparison among four models based on two dimensionality reduction methods combined with two classifiers which applied on Wisconsin Breast Cancer Dataset. The comparison will be measured by using accuracy, sensitivity and specificity metrics evaluated from the confusion matrices. The experimental results have indicated that the K-Means method, which is not usually used for dimensionality reduction can perform well compared to the popular Principal Component Analysis

    NusaCrowd: Open Source Initiative for Indonesian NLP Resources

    Full text link
    We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are under-represented despite being widely spoken

    WEIRD FAccTs: How Western, Educated, Industrialized, Rich, and Democratic is FAccT?

    Full text link
    Studies conducted on Western, Educated, Industrialized, Rich, and Democratic (WEIRD) samples are considered atypical of the world's population and may not accurately represent human behavior. In this study, we aim to quantify the extent to which the ACM FAccT conference, the leading venue in exploring Artificial Intelligence (AI) systems' fairness, accountability, and transparency, relies on WEIRD samples. We collected and analyzed 128 papers published between 2018 and 2022, accounting for 30.8% of the overall proceedings published at FAccT in those years (excluding abstracts, tutorials, and papers without human-subject studies or clear country attribution for the participants). We found that 84% of the analyzed papers were exclusively based on participants from Western countries, particularly exclusively from the U.S. (63%). Only researchers who undertook the effort to collect data about local participants through interviews or surveys added diversity to an otherwise U.S.-centric view of science. Therefore, we suggest that researchers collect data from under-represented populations to obtain an inclusive worldview. To achieve this goal, scientific communities should champion data collection from such populations and enforce transparent reporting of data biases.Comment: To appear at ACM FAccT 202

    Embryo Grading after In Vitro Fertilization using YOLO

    Get PDF
    In vitro fertilization is an implementation of Assistive Reproductive Technology. This technology can produce embryos outside the mother's womb by manipulating gametes outside the human body. The success rate of in vitro fertilization is the selection of good-grading embryos. In this study, the authors used Yolo Version 3 to perform object detection objectively by introducing grades for each embryo image. The author uses an embryo image sourced from the Indonesian Medical Education and Research Institute with information on the quality of the embryo. In this study, the author separated the data into two schemes. The first scheme separates data into training data of 70%, 15% validation data, and 15% for testing data. The second scheme uses a Stratified K-Fold Cross-Validation with a fold value =3. In training, the writer configures the values ??of Max Batches=6000, Steps=4800,5400, Batch=64, and Subdivision=16 by doing image augmentation (saturation=1.5, exposure=1.5, hue=0.1, jitter=0.3, random=1). For each of the obtained mAP (Mean Average Precision) values ??for data separation schemes, one is 100.00% in the 6000th iteration, while for the two-data separation scheme, the highest mAP is 97.33%.% in the fold=3 and 5000th iteration. It means that both separation schemes are sufficient in terms of mAP
    corecore